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Figure  1.  Urban  environment  classification  with  our  approach.  This  paper  is  best  viewed  in  color.  Unless  otherwise  noted,  the  same  color 
code  labeling  is  used  throughout  the  paper:  brown  for  ground,  red  for  facade,  green  for  scatter,  dark  blue  for  pole/trunk,  skye  blue  for  wire. 


Abstract 

This  paper  addresses  the  problem  of  assigning  a  la¬ 
bel  to  three-dimensional  data  points  collected  from  laser 
scanners.  We  are  specifically  interested  in  the  application 
of  environment  modeling  for  autonomous  robot  navigation 
in  natural  and  urban  terrains.  To  capture  contextual  in¬ 
formation,  we  choose  to  work  within  the  Markov  Random 
Field  framework.  The  approach  used  in  this  paper  is  a  vari¬ 
ant  of  the  Associative  Markov  Network  (AMN),  extended  to 
learn  directionality  in  the  clique  potentials,  resulting  in  a 
new  anisotropic  model  that  can  be  efficiently  learned  us¬ 
ing  a  gradient-based  method  for  non- differentiable  func¬ 
tion.  We  validate  the  proposed  approach  using  data  col¬ 
lected  from  different  range  sensors. 


1.  Introduction 

In  this  paper,  we  address  the  problem  of  automated 
interpretation  of  3-D  point  clouds  from  scenes  of  urban  and 
natural  environments;  our  analysis  is  performed  off-line, 
from  data  acquired  by  two  mobile  mapping  systems.  An 
example  of  our  approach  is  illustrated  in  Figure  1  with  five 
commonly  found  object  classes  :  ground,  facade,  scatter, 
pole/trunk,  wire.  We  are  interested  in  context-based  3-D 
point  classification  where,  in  addition  to  local  features,  a 
point’s  label  is  based  on  its  neighboring  points’  label  con¬ 


figuration.  Markov  Random  Fields  (MRFs)  (Li,  1995)  con¬ 
stitute  one  of  the  options  to  account  for  neighboring  infor¬ 
mation.  Such  techniques  proved  to  outperform  classifiers 
based  only  on  local  features  ((Lalonde  et  al.,  2007))  but 
tend  to  smooth  out  small  components  in  the  scene.  To  ad¬ 
dress  this  problem,  we  are  interested  in  using  a  MRF  vari¬ 
ant  called  an  Associative  Markov  Network  (AMN)  (Taskar 
et  al.,  2004). 

AMNs  and  its  variants  in  the  literature  (Anguelov 
et  al.,  2005;  Triebel  et  al.,  2007,  2006)  rely  on  local  features 
and  isotropic  contextual  information.  With  the  isotropic 
model,  the  influence  from  surrounding  points  is  only  based 
on  their  label,  regardless  of  their  relative  direction.  We  pro¬ 
pose  to  extend  the  AMN  to  account  for  local  directional 
information,  thus  producing  an  anisotropic  model.  The  di¬ 
rectional  information  can  come  from  the  relative  position 
of  the  two  points,  or  from  a  non- geometric  feature,  or  from 
the  local  point  topology.  Our  proposed  approach  is  differ¬ 
ent,  as  we  will  show,  from  using  local  directional  features. 
This  natural  extension  is  enabled  by  utilizing  the  recently 
proposed  subgradient  method  shown  to  solve  AMNs  effi¬ 
ciently  (Ratliff  et  al.,  2007).  Originally,  learning  for  AMNs 
was  formulated  as  quadratic  program  which  is  very  mem¬ 
ory  intensive  when  applied  to  3-D  point  cloud  processing; 
however,  with  the  subgradient  method,  memory  constraints 
are  only  linear  in  the  amount  of  training  data,  thus  allowing 
the  development  of  a  more  expressive  model.  We  compare 
the  improvement  in  our  model  against  the  standard  AMN 
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and  a  linear  Support  Vector  Machine  (SVM)  (Joachims, 
1999). 

This  paper  reuses  the  formulation  and  some  mate¬ 
rial  presented  in  (Munoz  et  al.,  2008).  The  emphasis  is 
put  here  on  experimentation  and  new  results  are  presented 
including  results  produced  using  data  from  the  Demo-III 
XUV  (Bomstein  and  Shoemaker,  2003). 

The  paper  is  structured  into  five  sections.  In  the 
next,  various  notations  are  introduced  and  background  on 
the  AMN  and  subgradient  method  is  presented.  The  con¬ 
tributions  of  the  paper  follows  in  Section  3  and  results  in 
Section  4.  Section  5  concludes  the  paper. 

2.  Associative  Markov  Network 

2.1.  Problem 

Following  the  notation  from  (Taskar  et  al.,  2004), 
our  classification  task  can  be  formalized  as  follows.  Given 
a  set  of  N  random  variables  Y  =  {Y\ , . . . ,  Y^},  where  each 
variable  can  obtain  a  value  Yi  G  { 1 , . . . ,  K},  find  the  assign¬ 
ment  of  values  of  y  =  {yi, . . . ,yw}  to  Y  that  maximizes 
some  scoring  function.  In  the  context  of  3-D  point  clas¬ 
sification,  each  random  variable  represents  a  3-D  point  and 
its  value  corresponds  to  the  label  it  can  be  assigned.  Formu¬ 
lating  the  classification  task  as  a  supervised  learning  prob¬ 
lem,  we  want  to  learn  a  discriminative  model  that  condi¬ 
tions  the  joint  distribution  on  the  features  x  that  we  can 
extract  from  the  scene  Pw(y|x),  where  w  are  the  model  pa¬ 
rameters.  The  classification  procedure  is  then  broken  into 
two  steps:  (1)  learning  the  model  parameters  given  labeled 
data  (x,  y)  and  then  (2)  inferring  the  best  assignments  of  a 
novel  scene  given  its  features. 

2.2.  Standard  AMN  formulation 

A  MRF,  also  called  a  Markov  Network,  defines  a 
joint  distribution  for  random  variables  Y;  it  is  represented 
as  an  undirected  graph  with  N  nodes  for  each  random  vari¬ 
able  and  edges  E  =  {(/,  j)}|(z  <  j)  that  define  the  inter¬ 
actions  between  variables.  Generally,  a  non-negative  po¬ 
tential  function  is  defined  for  cliques  of  arbitrary  size  in 
the  graph;  however,  due  to  the  requirement  of  efficient  in¬ 
ference  techniques,  focus  is  generally  on  pairwise  Markov 
Networks.  This  model  only  defines  a  node  potential  <| hiyi) 
for  each  node  i  and  an  edge  potential  §ij(yuyj)  for  linked 
nodes  i  and  j.  These  potentials  measure  the  affinity1  of 
the  assignment  to  the  variables  in  the  cliques.  A  log-linear 
model  is  used  to  represent  the  dependence  of  the  potentials 
on  the  features  x  =  {xi,Xij}  where  x\  G  Rdn  and  xy  G  Rde 
are  the  features  that  describe  node  i  and  the  relationship 
between  nodes  i  and  j,  respectively.  The  log  of  the  node 
potential  is  defined  as  log  (k)  =  wj^  •  Xi  where  k  =  yi  (the 
label  value  of  node  i )  and  G  Rdn  are  the  weights  used 
when  a  node  is  assigned  k. 

'The  affinity  value  is  also  referred  to  as  the  energy  of  the  clique. 


Under  the  AMN  framework,  a  variant  of  the 
Pott’s  model  is  used  that  penalizes  differing  assignments 
across  an  edge:  \/k  ^  /,  logc foj(kj)  =  w^’1  •  xy  =0  and 
lo g§ij(k,k)  >  0,  where  We’1  G  Rde  are  the  weights  used 
when  linked  nodes  are  assigned  k  and  /.  In  order  to  en¬ 
sure  non-negativity  in  the  edge  potentials,  the  feature  and 

k  k 

weight  vectors  are  constrained  by  xy  >  0  and  we  >  0.  Fi¬ 
nally,  changing  the  representation  of  an  assignment  y  with 
a  vector  of  K  -N  indicator  variables  where  y  =  {y\ ,  k,  i\y\  = 
I(ji  =  &)},  the  log  of  the  joint-conditional  probability 
logPw(y|x)  is  given  by: 

E  Uwn'x<)>f  +  E  E  (we  k  -  xu)ykiykj  -  iogzw(x) 

i=\k=  1  (ij)eEk=  1 

0) 

where  Zw(x)  =  Zy'Uf=i^i(y'i)UijeE^j(y'i,y'j)  is  the  par- 
tition  function.  Note  although  this  value  is  intractable  to 
compute,  it  does  not  depend  on  y  which  is  essential  for  per¬ 
forming  inference. 

To  abbreviate  notation,  define  a  K[dn  +de)  length 
row  vector  w  =  {wn,we}  with  wn  =  {w*, . . . ,  w^}  and 
we  =  {wj, . . . , wf }.  Also  redefine  y  to  be  a  K(N  +  \E\) 
column  vector y  =  {yn,ye}r  withyn  =  {.  .,,yj , . . .  ,yf, . . .} 
and  ye  =  {•••  ,yjj,  ■  ■  ■  ,yfj,  ■ . .}  where  y)-  =  yf  A  y).  Finally, 
construct  X  to  be  a  K(dn  +  de)  x  K{N  +  |£j)  matrix  such 
that  logPw(y|x)  =  wXy  —  logZw(x).  This  matrix  will  con¬ 
tain  the  features  repeated  multiple  times  in  the  columns  and 
padded  with  zeros  appropriately. 

Note  that  the  inference  task  y*  = 
argmaxy  Pw(y|x)  =  argmaxywXy  is  an  integer  pro¬ 
gram  and  is  NP-hard.  In  (Taskar  et  al.,  2004),  the  authors 
show  how  to  relax  the  integral  constraints  on  y,  resulting 
in  a  linear  program  that  finds  the  optimal  solution  when 
K  =  2.  For  K  >  2,  a  rounding  procedure  is  performed  that 
achieves  an  approximation.  The  authors  also  state  that 
when  K  =  2,  exact  inference  can  be  done  by  finding  the 
min-cut  of  a  specially  constructed  graph  because  the  asso¬ 
ciative  constraints  on  the  negative  edge  potentials  define  a 
submodular2  function  (Kolmogorov  and  Zabin,  2004).  For 
K  >  2,  performing  an  iterative  min-cut  algorithm,  called 
a-expansion,  also  achieves  an  approximation.  We  refer  to 
(Taskar  et  al.,  2004)  and  (Kolmogorov  and  Zabin,  2004) 
for  more  details. 

Finding  the  optimal  w  is  formulated  as  a  max- 
margin  learning  problem.  Given  labeled  data  (x,y),  the 
goal  is  to  find  the  weights  that  maximize  the  margin  of  con¬ 
fidence  in  Pw(y|x)  versus  Pw(y|x)  Vy  ^  y.  This  learning 
problem  is  formulated  as  the  following  convex  program: 

min  i||w||2  +  ^ 

w’^  (2) 
s.t  wXy  +  ^  >  max  wXy  +  L( y) 
y 

2  A  function  of  two  binary  variables  is  (a,  (3)  is  submodular  if  and  only 

if£(0,0)#£(l,l)<£(0,l)  +  £(l,0) 
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where  £  is  a  slack  variable  that  represents  the  gap  in  the 
total  energy  between  the  optimal  and  achieved  solutions 
and  L(y)  is  a  loss  function  which  measures  the  error  of 
classification.  As  in  (Taskar  et  al.,  2004)  and  (Anguelov 
et  al.,  2005),  we  use  the  Hamming  distance  between  the 
true  and  achieved  assignments  for  our  loss  function.  In 
(Taskar  et  al.,  2004),  the  authors  show  how  to  substitute 
the  dual  of  the  inference  LP  to  bound  the  non-linear  con¬ 
straint  which  then  results  in  a  valid  quadratic  program  and 
can  then  be  solved  by  optimization  software.  Again,  we 
refer  to  (Taskar  et  al.,  2004)  for  more  details. 

2.3.  Subgradient  method  for  learning 

In  (Ratliff  et  al.,  2006,  2007),  the  authors  show  that 
it  is  possible  to  solve  Program  2  by  writing  the  constraint  in 
the  objective  function,  due  to  the  slacks  being  equal  at  the 
optimal  condition,  and  then  taking  the  subgradient  of  the 
resulting  objective  function.  Thus,  the  AMN  regularized 
cost  function  is: 

A 1 1  w  1 1  “ 

c(w)  =  — A  +max(wXy  +  £(y))  —  wXy  (3) 

2  y 

The  key  to  compute  the  subgradient  of  Equation 
3  is  to  use  the  property:  if  f(a,b)  is  differentiable  in  a , 
then  Vaf(a,b*)  is  a  subgradient  of  the  convex  function 
ma X£,f(a,b)  for  b*  E  argma Xbf(a,b).  Therefore,  a  sub¬ 
gradient  gw  E  3c (w)  is: 

gw  =  Aw  +  Xy*-Xy 

As  previously  mentioned,  solving  maxy(wXy  + 
L( y))  can  be  done  with  graph  cuts  or  an  LP.  Starting  with 
w  =  0,  the  solution  is  then  achieved  through  descent  until 
convergence,  or  T  iterations,  using  the  update  rule  at  time 
t : 

Wt+1  =  2V[wt  -  agwJ 

where  ‘P,hj  projects  w  onto  a  convex  set  W  formed  by  any 
specific  convex  constraints  on  w;  for  AMNs,  this  projection 
enforces  any  negative  we  to  become  0.  Typical  step- sizes 
are  a  =  f  and  a  =  ^=,  for  some  positive  c. 

3.  Directional  Associative  Markov  Network 
3.1.  Motivation 

Applications  of  AMNs  for  3-D  point  cloud  classifi¬ 
cation  have  proved  to  do  well  when  classifying  large,  domi¬ 
nant  structures  in  the  scene  such  as  vegetation,  buildings  or 
walls,  and  the  ground  plane  (Triebel  et  al.,  2006;  Anguelov 
et  al.,  2005).  However,  in  most  urban  environments,  there 
exist  finer  objects  such  as  branches,  posts,  utility  poles,  and 
power-lines  that  are  harder  to  perceive  with  laser  scanners. 
In  addition,  these  labels  prove  more  challenging  to  classify 
when  in  the  vicinity  of  data  from  more  dominant  labels, 


such  as  vegetation,  because  the  AMN  prefers  to  spatially 
maintain  the  same  labels.  Observe  that  Equation  1  is  max¬ 
imized  when  the  labels  of  two  nodes  in  an  edge  potential 
agree  and  the  combination  of  the  features  and  correspond¬ 
ing  chosen  weights  is  highest.  Thus,  when  indicative  fea¬ 
tures  for  the  label  cannot  be  computed,  the  label  assignment 
is  chosen  to  agree  with  its  surroundings  which  may  smooth 
away  these  small  structures  we  are  interested  in. 

3.2.  Directionality 

By  accounting  for  directional  information  when 
computing  our  edge  potentials  we  propose  to  address  the 
limitations  presented  above.  A  basic  way  to  accomplish 
this  is  to  utilize  the  edge  orientation  when  computing  the 
energy.  However,  for  3-D  point  cloud  processing  the  edge 
orientation  is  not  expressive  enough  as  the  created  edges 
will  depend  on  the  point  density.  Fortunately,  most  objects 
in  the  world  often  have  an  associated  and  well-defined  di¬ 
rection  that  we  can  estimate.  For  example,  tree  trunks  gen¬ 
erally  grow  vertically,  power-lines  usually  lie  horizontally 
and  we  can  estimate  a  local  tangent  vector  at  each  point  for 
both  labels.  Our  goal  is  utilize  this  intrinsic  information  in 
our  model  so  that  a  node’s  context  accounts  for  its  neigh¬ 
bors’  local  directions  in  addition  to  the  labels.  The  idea 
behind  this  approach  is  to  create  a  more  expressive  model 
that  learns  how  to  classify  the  data  correctly  when  the  esti¬ 
mated  features,  and  consequentially  the  estimated  local  di¬ 
rection,  are  in  a  less  separable  or  in  a  lower  density  region 
of  the  feature  space.  That  is,  we  do  not  learn  a  single  set 
of  weights  that  tries  to,  overall,  best  model  one  class’  fea¬ 
tures.  Instead,  we  want  to  account  for  variation  in  feature 
estimation  and  learn  multiple  sets  of  weights  for  different 
locations  in  feature  space  that  best  model  the  class.  By 
incorporating  directional  information  in  the  AMN  frame¬ 
work,  we  show  how  we  can  better  preserve  these  smaller 
structures  and  improve  the  overall  classification  rate. 

3.3.  Anisotropic  model 

The  standard  AMN  formulation  is  an  isotropic 
model,  that  is,  regardless  of  the  orientation  of  the  edge, 
the  potentials  are  computed  in  the  same  manner.  We  pro¬ 
pose  using  an  anisotropic  model  where  the  weights  chosen 
to  compute  the  edge  potentials  depend  on  its  label  and  de¬ 
fined  direction;  we  call  this  new  model  a  Directional  AMN. 
We  note  that  our  approach  extends  to  cliques  of  arbitrary 
size  and  is  not  limited  to  those  of  size  two.  The  directional 
information  is  obtained  by  comparing  a  clique’s  intrinsic 
direction  against  a  predefined  reference  direction  when  the 
clique  is  labeled  k.  The  resulting  angle  between  the  intrin¬ 
sic  and  reference  directions  is  then  binned.  In  addition  to 
the  label,  the  binned  angle  determines  the  sets  of  weights 
used  to  compute  the  clique  potential,  thus  producing  an 
anisotropic  model.  Figure  2  illustrates  the  following  expla¬ 
nation  of  computing  an  anisotropic  edge  potential  when  its 
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nodes  are  labeled  k.  For  the  two  linked  nodes  ( nt ,  nj)  an  in¬ 
trinsic  direction  (P\j)  is  computed  that  describes  the  direc¬ 
tion  of  the  clique  (edge)  when  its  nodes  are  labeled  k.  This 
intrinsic  direction  can  be  defined  arbitrarily.  For  example, 
the  intrinsic  direction  could  simply  be  the  direction  of  the 
edge  (< de ),  however,  as  previously  mentioned,  this  would 
not  provide  much  utility.  Another  example  is  to  define  a 
local  feature  direction  for  each  node  (df)  that  describes  the 
local  direction  when  labeled  k ,  such  as  the  normal  vector 
when  fitting  a  plane,  and  then  define  the  clique’s  intrinsic 
direction  to  be  a  function  of  each  node’s  feature  direction. 
The  reference  direction  can  be  an  absolute  direction  (If4), 
such  as  the  vertical  axis,  or  based  on  the  local  point  cloud 
topology. 


Figure  2.  Directionality  illustration. 

It  is  important  to  note  that  the  anisotropic  model  is 
different  from  an  isotropic  model  with  directional  informa¬ 
tion  in  the  features  space;  Figure  3  illustrates  this  claim.  In 
this  example,  two  artificial  data  sets  were  generated  that 
contain  two  intersecting  lines,  parallel  to  the  x-y  plane, 
and  are  surrounded  by  randomly  generated  scattered  points 
at  two  different  locations.  Note  that  this  synthetic  point 
cloud  configuration  mimics  a  common  natural  scene  where 
power-lines  are  embedded  in  the  vegetation.  In  the  training 
set,  illustrated  in  Figure  3-(a),  the  scattered  points  lie  at  the 
extremity  of  the  lines,  and  for  the  testing  set,  illustrated  in 
Figure  3 -(b),  the  scattered  points  are  moved  to  the  intersec¬ 
tion  of  the  lines.  In  this  example  we  use  a  standard  and 
Directional  AMN  with  the  features  defined  in  Section  4.3. 
Figure  3-(c),  shows  that  the  standard  AMN  smoothes  out 
the  classes  we  are  interested  in,  while  Figure  3-(d)  shows 
that  the  Directional  AMN  performs  a  better  job  of  preserv¬ 
ing  the  small  linear  structure  while  increasing  overall  clas¬ 
sification  rate. 

3.4.  Directional  AMN  formulation 

Incorporating  the  anisotropic  potentials  involves 
modifying  the  higher-order  clique  potentials  from  the  orig¬ 
inal  formulation,  that  is,  modifying  the  edge  potentials  in 
the  pairwise  model.  These  clique  potentials  must  now 
consider  a  direction  term  when  computing  the  potential. 
For  each  label  k ,  we  parameterize  a  direction  by  binning 
the  possible  angle-space  formed  by  the  intrinsic  direction 
against  the  reference  direction  when  all  nodes  in  the  clique 


(C)  (d) 

Figure  3.  Difference  between  directional  features  and  directional 
potentials,  with  the  lines/scatter  points  in  blue/green.  (a)  Training 
data,  (b)  Ground  truth  for  the  testing  data,  (c)  Standard  AMN.  (d) 
Directional  AMN. 


are  labeled  k.  Remember  that  the  intrinsic  and  reference 
directions  are  specific  to  each  label.  We  denote  the  set  of 
bins  that  constitute  this  space  for  label  k  as  0^.  Note  that 
the  number  of  bins  |0^|  for  each  label’s  angle-space  are 
not  necessarily  equal.  Therefore,  the  weight  vector  chosen 
when  computing  the  clique  potential  is  dependent  on  the 
clique’s  label  k  and  the  computed  bin  0  G  0£  that  the  angle 
between  the  intrinsic  and  reference  directions  falls  under, 
for  label  k.  In  the  pairwise  model,  the  anisotropic  edge  po- 
tentials  are  then  defined  log§ij(k,k)  =  we’  •  xy  >  0  where 
0  G  0£  is  the  computed  bin,  and  \/k  ^  /,  log^y  (&,  /  )=  0.  In¬ 
corporating  these  changes,  logPw(y|x)  is  proportional  to: 

("5  •*)>?+  I  1 1  (4) 

i=lk=  1  (ij)eEk=  10<E©fc 

where  D®. ,  is  an  indicator  function  defined  to  be  one  if  the 

vr 

nodes  in  edge/clique  (i j )  are  both  labeled  k  and  the  angle 
between  its  intrinsic  and  reference  direction  lies  in  bin  0  G 

As  done  with  the  standard  AMN,  we  can  relax 
Equation  4  into  a  linear  combination.  This  is  achieved  by 
introducing  indicator  variables  y^j0  =  y\  A  ykj  A  Q®.  k  and 
redefining  ye  to  be  a  \E\  *I^*Lf=1|0&|  length  indica¬ 
tor  vector:  ye  =  {. . . , y f j 1 , . . .  ,...}.  Similarly,  re¬ 

define  we  to  be  a  K  *  de  *  £f=1 10*|  length  vector:  we  = 
{. . . ,  , . . . ,  wek,l0^ ,...}.  Appropriately  redefining  X, 

we  can  now  rewrite  Equation  4  in  matrix  form  wXy  and 
solve  the  new  model  using  the  subgradient  method  as  be¬ 
fore.  Inference  is  easily  performed  through  the  min-cut 
framework  with  the  a-expansion  algorithm  (Boykov  and 
Kolmogorov,  2004).  At  each  expansion  step,  we  compute 
the  potential  of  each  clique.  If  all  the  nodes’  labels  in  a 
clique  agree,  then  the  associated  intrinsic  and  reference  di¬ 
rections  are  determined  for  that  clique  and  label.  Using 
the  resulting  computed  bin  and  label,  the  appropriate  set  of 
weights  are  then  selected  to  compute  the  potential. 
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4.  Experiments 

4.1.  Data  sets  and  features 

The  results  presented  below  were  obtained  using 
data  collected  using  two  different  mapping  systems:  a  ve¬ 
hicle  equipped  with  a  set  of  SICK  lasers  and  Demo-III 
XUV.  In  both  cases  the  vehicle  was  used  to  collect  spatially 
aligned  data  and  no  data  processing  occurred  onboard  the 
vehicle.  The  first  data  set,  coined  the  ”push-broom”  data 
set,  was  produced  using  a  set  of  static  SICK  laser  mounted 
on  a  moving  platform  equipped  with  a  navigation  system. 
The  vehicle  drove  in  an  urban  environment  at  up  to  20 
km/h.  The  second  data  set,  coined  ”XUV”  data  set,  was 
produced  using  the  Demo-III  XUV  equipped  with  a  3-D 
mobility  ladar  mounted  on  a  turret,  in  the  front  of  the  XUV. 
The  Demo-III  XUV  was  tele-operated  in  forested  environ¬ 
ments  and  a  mock-up  urban  environment  at  2  m/s. 

The  various  data  sets  were  hand  labeled  systemat¬ 
ically  into  more  than  fifty  different  classes.  Labels  were 
filtered  out  or  collapsed  into  one  of  five  labels  (wire, 
pole/trunk,  scatter,  ground  and  facade).  A  total  of  half 
million  3-D  points  were  labeled  and  used  to  produce  re¬ 
sults  with  ground  truth  for  this  paper.  A  total  of  more  than 
five  millions  3-D  points  corresponding  to  more  than  two 
kilometers  traversed  were  classified  and  analyzed  for  the 
”push-broom”  data  set.  In  the  ”XUV”  data  set,  the  data 
were  first  collapsed  into  10  cm  edge- voxels.  Half  million 
of  voxels  were  labeled  from  a  total  of  140  millions  voxels 
collected  over  10  km  of  traverse. 

We  implemented  three  geometric  features  com¬ 
monly  used  in  spectral  analysis  of  point  clouds.  We  de¬ 
fine  A,2  >X\>  Xq  to  be  the  eigenvalues  of  the  scatter  ma¬ 
trix  M  defined  over  a  local  neighborhood  9^,  around  point 
p.  These  features  capture  the  {point,  surface,  linear} - 
’’ness”  of  the  local  geometry:  {cp  =  Xo,  <5S  =  X\  —  Xo,  CJ/  = 
A/2  —  Xi } ,  respectively.  We  will  refer  to  these  as  the  spectral 
features.  Next,  we  estimate  the  local  tangent  vt  and  normal 
vn  vectors  for  each  point  by  using  the  principal  and  least 
principal  eigenvectors  of  M,  respectively.  We  then  compute 
the  cosine  and  sine  of  the  angles  formed  between  the  direc¬ 
tions  of  vt  and  vn  against  the  vertical  and  horizontal  plane, 
resulting  in  four  values.  Though,  depending  on  the  local 
neighborhood,  the  estimated  directions  may  be  arbitrary. 
We  estimate  a  confidence  by  scaling  the  values  when  using 
{ V, ,  vn }  by  {o;,CTs}/max(o;,ap,ai),  respectively.  We  will 
refer  to  these  scaled  values  as  the  directional  features.  The 
actual  node  and  edge  features  used  for  each  experiment  will 
be  defined  in  their  upcoming  and  respective  subsection. 

4.2.  Model  parameters  and  timing 

Optimal  parameters  were  obtained  by  maximizing 
the  classification  rate  of  various  labeled  data  sets.  For  re¬ 
sults  reported  on  both  data  sets,  we  obtained  the  subgradi¬ 
ent  parameters  X  =  0.005  and  a  =  ^.  For  the  ’’sweeping” 
data,  T  =  500  and  for  the  ”push-broom”  data,  T  =  800. 


The  0\[p  was  defined  with  a  radius  of  0.6  m  for  the  ’’push- 
broom”  data;  we  disregard  points  where  \9[p\  <4. 

Results  were  computed  on  a  Intel(R)-based  2.40 
GHz  processor  with  4  GB  RAM.  We  present  timing  analy¬ 
sis  on  the  ”push-broom”  data  set.  The  training  set  consisted 
of  a  graph  with  18  898  nodes  and  55  507  edges.  Train¬ 
ing  took  151  minutes  for  the  Directional  AMN  versus  148 
minutes  for  the  standard  AMN.  The  ground  truth  testing 
set  consisted  of  a  graph  with  385  611  nodes  and  1  077  968 
edges;  4  690  points  were  disregarded  due  to  neighborhood 
size.  On  the  test  data  set,  feature  computation  and  graph 
construction  completed  in  under  6.5  minutes,  combined. 
Inference  for  the  Directional  AMN  required  9.3  minutes 
versus  9  minutes  for  the  standard  AMN. 

We  constructed  the  graphs  by  iterating  over  the 
nodes  and  linking  each  node  to  its  five  nearest  neighbors. 
We  observed  that  the  facade  had  the  least  amount  of  inter¬ 
actions  with  the  other  labels  while  scatter  had  the  most. 

4.3.  Classifying  the  ”push-broom”  data  set 

We  compare  Directional  AMN  (facade,  ground 
pole/trunk,  wire)  against  the  standard  AMN,  where  facade 
binned  the  angles  between  vn  and  the  horizontal  plane  into 
bins  { [0 ,  7c  / 6] ,  (7c  / 6 ,  7c/2] } .  Anisotropic  potentials  are  de¬ 
fined  for  both  the  pole/trunk  and  wire  label  to  bin  the  space 
between  vt  and  the  horizontal  plane  and  vertical,  respec¬ 
tively,  into  bins  { [0 ,  7i/6] ,  (n/6 ,  n/2] } .  For  these  results 
we  found  using  the  directional  features  in  both  models 
increased  performance,  and  we  note  that  the  Directional 
AMN  performed  better  in  both.  For  the  edge  features,  we 
concatenate  two  linked  nodes’  spectral  features  and  com¬ 
pute  a  similarity  feature  for  the  directional  features.  This 
similarity  feature  is  defined  to  be  1  /( 1  +  |  dfi  —  dfj  | )  where 
dfi  is  a  directional  feature  of  node  i. 

Figure  4  shows  results  on  part  of  the  section  used 
for  quantitative  performance  evaluation.  Note  the  close-up 
view  of  the  pole  and  wires  correctly  labeled.  Points  not  be¬ 
longing  to  the  five  labels  used  for  this  evaluation  were  fil¬ 
tered  out  from  the  fully  labeled  ground  truth  data.  We  chose 
this  approach  to  be  able  to  correctly  compare  the  classifi¬ 
cation  results  of  the  standard  and  Directional  AMN  with 
different  features. 

Table  1  presents  the  recall  and  precision,  for  the 
Directional  AMN,  and  standard  AMN,  computed  over  the 
subset  with  ground  truth,  over  390  000  points.  As  shown 
the  Directional  AMN  is  producing  better  precision  and  re¬ 
call  than  the  standard  AMN  for  all  labels.  The  most  com¬ 
mon  error  in  classification  is  due  to  point  density  variation. 
This  is  clear  with  the  precision  of  the  wire  and  pole/trunk 
labels.  Sections  of  ground  far  from  the  sensor  tend  to  be 
mislabeled  as  wire.  Low  density  coupled  with  occlusions, 
generate  facade  points  being  mislabeled  as  pole/trunk.  The 
second  common  source  of  error  is  the  inability  of  the  fea¬ 
tures  to  capture  the  scene.  For  example  bundle  wires  are 
misclassified  as  facade  in  Figure  4.  A  second  example  is 
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Figure  4.  Push-broom”  data  set.  First  example  of  the  classification  of  part  of  the  ground  truth  subset  into  five  labels.  Top,  scene  overview. 
Bottom:  left  and  center,  scene  picture,  from  Google  Street  View;  right,  classification  close-up  view. 


Figure  5.  ”Push-broom”  data  set.  Second  example  of  the  classification  of  part  of  the  ground  truth  subset  into  five  labels.  Left,  directional 
amn;  center  and  right,  scene  picture,  from  Google  StreetView. 


presented  in  figure  5. 


Recall 

Precision 

scatter 

wire 

pole/trunk 
load  bearing 
facade 

0.881  (0.856) 
0.789  (0.778) 
0.926  (0.899) 
0.949  (0.945) 
0.786  (0.672) 

0.974  (0.973) 
0.125  (0.124) 
0.287  (0.230 ) 
0.982  (0.963) 
0.908  (0.865) 

Table  1.  ”Push-broom”  data  set.  Precision  and  recall  for  the  di¬ 
rectional  AMN  and  standard  AMN  for  the  ”push-broom”  data  set. 
The  overall  classification  rate  is  91.66%  versus  89.67%  for  the 
standard  AMN  on  the  same  features. 

We  processed  the  non-ground  truth  subsection  of 
the  ”push-broom”  data  set,  over  4.5  millions  3-D  points. 
In  such  a  case,  all  scene  elements  from  the  raw  data  are 
present.  We  present  results  for  the  best  classifier,  the  Di¬ 
rectional  AMN;  qualitatively  the  classifier  performs  well, 
as  shown  in  Figure  6  and  Figure  7.  Objects  not  part  of  the 
training  data,  such  as  traffic  signs,  traffic  lights  and  their 
support  post  are  actually  assigned  to  the  closest  geometri¬ 
cal  label,  respectively  facade  and  linear. 

4.4.  Classifying  the  ”XUV”  data  set 

We  present  here  preliminary  qualitative  results  ob¬ 
tained  with  data  collected  using  the  Demo-III  XUV,  in  ur¬ 


Figure  6.  ”Push-broom”  data  set.  Classification  on  raw  data  with 
five  labels:  left,  standard  AMN;  center,  directional  AMN;  right, 
scene  picture,  from  Google  StreetView. 


ban  (Figure  8)  and  natural  environment  (9).  Note  that  the 
Demo-III  XUV  could  not  be  deployed  in  the  same  environ¬ 
ment  as  the  one  used  for  the  ”push-broom”  data  set,  and 
that  the  environment  available  did  not  contain  power  lines. 
In  Figure  8,  the  utility  pole  is  segmented  correctly  as  well 
as  the  ground,  the  facade  and  the  small  retaining  wall  on  the 
right  hand  side  of  the  image.  The  column  are  also  classi¬ 
fied  correctly.  A  noticeable  error  is  the  misclassification  of 
the  junction  with  the  ground  as  ’’foliage”.  The  quantitative 
analysis  of  those  results  is  not  yet  available.  In  Figure  9,  the 
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Figure  7.  ”Push-broom”  data  set.  Classification  on  raw  data  with  five  labels.  Top:  Left,  standard  AMN;  center,  directional  AMN;  right, 
scene  picture,  from  Google  Street  View. 


short  grass  tend  to  be  classified  as  (rough)  ground  while  the 
Jersey  barrier  is  not  segmented  because  of  the  clutter  and 
occlusion  by  vegetation. 

5.  Conclusion 

In  this  paper  we  present  a  contribution  to  the  prob¬ 
lem  of  automated  3-D  point  cloud  classification  for  scene 
interpretation.  We  extend  the  standard  Associative  Markov 
Network  model  to  account  for  directional  information,  thus 
producing  a  new  anisotropic  model  capable  of  represent¬ 
ing  accurately  more  complex  scene  structures  than  before. 
Recent  developments  in  optimization  with  the  subgradient 
method  have  allowed  us  to  develop  and  learn  this  more 
complex  model.  We  show  how  the  proposed  Directional 
AMN  is  different  from  using  directional  features  with  the 
standard  AMN  formulation.  The  approach  is  validated  us¬ 
ing  data  accumulated  by  two  different  mobile  mapping  sys¬ 
tems.  We  produced  quantitative  performance  evaluations 
on  a  very  large  manually  labeled  set  (over  400  000  points) 
and  qualitative  on  the  remaining  data  for  a  total  of  more 
than  12  km  of  terrain  traversed.  We  are  currently  integrat¬ 
ing  this  approach  onboard  the  Demo-III  XUV  for  on-line, 
on-board  data  processing  for  environment  modeling. 
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